Introduction

A R package of MAVE QC


Analysis

A simple example for QC


Screen QC

What’s Screen QC?


Load Library

It may require to load dependecies, please refer to the installation in README.

library(MAVEQC)


Import Input Files

This creates a list of objects which have all the input datasets.

sge_objs <- import_sge_files("/path/to/input/directory", "sample_sheet.tsv")
## Importing files for samples:
##     |--> hgsm3_d0_r1
##     |--> hgsm3_d4_r1
##     |--> hgsm3_d7_r1
##     |--> hgsm3_d15_r1
##     |--> hgsm3_d4_r2
##     |--> hgsm3_d7_r2
##     |--> hgsm3_d15_r2
##     |--> hgsm3_d4_r3
##     |--> hgsm3_d7_r3
##     |--> hgsm3_d15_r3
sge_objs[[1]]
## An object of class SGE
## |--> sample name: hgsm3_d0_r1
## |--> library type: screen
## |--> library name: SMARCA4_exon26
##     |--> 5' adaptor: CTGACTGGCACCTCTTCCCCCAGGA
##     |--> 3' adaptor: CCCCGACCCCTCCCCAGCGTGAATG
##     |--> ref seq: CCGGTGCTGGGCTCACCTCATCCTGCTCCTCGTGCTCCAGGATGGCCTGCAGGAAGGCGCGCCGCTCATGGCTGGAGGACTTCTGGTCGAACATGCCGGCCTGGATCACCTTCTGGTCCACGTTGAGCTTGTACTTGGCTGCAGCTAGGATCTTCTCCTCCACGCTGTTGACGGTGCAGAGGCGGAGCACACGCACCTCGTTCTGCTGCCCGATGCGGTGGGCTCGGTCCTGCGCTTGCAGG
##     |--> pam seq: CCGGTGCTGGGCTCACCTCATCCTGCTCCTCGTGCTCCAGGATGGCCTGCAGGAAGGCGCGCCGCTCATGGCTGGAGGACTTCTGGTCGAACATGCCGGCCTGGATCACCTTCTGGTCCACGTTGAGCTTATATTTAGCTGCAGCTAGGATCTTCTCCTCCACGCTGTTGACGGTGCAGAGGCGGAGCACACGCACCTCGTTCTGCTGCCCGATGCGGTGGGCTCGGTCCTGCGCTTGCAGG
##     |--> No. of library-dependent counts: 3273
##     |--> No. of library-independent counts: 370889
## |--> valiant meta: 3273 records and 24 fields
##     |--> 3273 library-dependent count ids matched in valiant meta oligo names


Run Sample QC

It requires an QC object to run the process. create_sampleqc_object can create the QC object using a list of objects.

Sample QC here needs the reference samples for the guidance, which can by provided by the vector of sample names or sample index in the sample sheet.

samqc <- create_sampleqc_object(sge_objs)
samqc@samples_ref <- select_objects(sge_objs, c(2,5,8))
samqc <- run_sample_qc(samqc, "screen")
## Filtering by the total number of reads...
## Filtering by low counts...
##     |--> Creating k-means clusters...
##     |--> Filtering using clusters...
##         |--> Filtering on hgsm3_d0_r1
##         |--> Filtering on hgsm3_d4_r1
##         |--> Filtering on hgsm3_d7_r1
##         |--> Filtering on hgsm3_d15_r1
##         |--> Filtering on hgsm3_d4_r2
##         |--> Filtering on hgsm3_d7_r2
##         |--> Filtering on hgsm3_d15_r2
##         |--> Filtering on hgsm3_d4_r3
##         |--> Filtering on hgsm3_d7_r3
##         |--> Filtering on hgsm3_d15_r3
## Filtering by depth and percentage in samples...
## Filtering by library mapping...
## Filtering by library coverage...
## Sorting library counts by position...
##     |--> Sorting on hgsm3_d0_r1
##     |--> Sorting on hgsm3_d4_r1
##     |--> Sorting on hgsm3_d7_r1
##     |--> Sorting on hgsm3_d15_r1
##     |--> Sorting on hgsm3_d4_r2
##     |--> Sorting on hgsm3_d7_r2
##     |--> Sorting on hgsm3_d15_r2
##     |--> Sorting on hgsm3_d4_r3
##     |--> Sorting on hgsm3_d7_r3
##     |--> Sorting on hgsm3_d15_r3
## Calculating gini coefficiency...
## Mapping consequencing annotation...
samqc
## An object of class sampleQC
## |--> samples: 
##     |--> hgsm3_d0_r1
##     |--> hgsm3_d4_r1
##     |--> hgsm3_d7_r1
##     |--> hgsm3_d15_r1
##     |--> hgsm3_d4_r2
##     |--> hgsm3_d7_r2
##     |--> hgsm3_d15_r2
##     |--> hgsm3_d4_r3
##     |--> hgsm3_d7_r3
##     |--> hgsm3_d15_r3
## |--> reference samples: 
##     |--> hgsm3_d4_r1
##     |--> hgsm3_d4_r2
##     |--> hgsm3_d4_r3
## |--> QC results: 
##     |--> hgsm3_d0_r1: TRUE
##     |--> hgsm3_d4_r1: TRUE
##     |--> hgsm3_d7_r1: TRUE
##     |--> hgsm3_d15_r1: TRUE
##     |--> hgsm3_d4_r2: TRUE
##     |--> hgsm3_d7_r2: TRUE
##     |--> hgsm3_d15_r2: TRUE
##     |--> hgsm3_d4_r3: TRUE
##     |--> hgsm3_d7_r3: TRUE
##     |--> hgsm3_d15_r3: TRUE
##     |--> NA: NA
##     |--> NA: NA
##     |--> NA: NA
##     |--> NA: NA
##     |--> NA: NA
##     |--> NA: NA
##     |--> NA: NA
##     |--> NA: NA
##     |--> NA: NA
##     |--> NA: NA
##     |--> NA: NA


Create Plots & Tables

The output directory is required for plotting using the QC object.


  • Read Length Distrubtion
qcplot_readlens(samqc, plotdir = outputdir)
qcout_sampleqc_length(samqc)


  • Total Reads
qcplot_stats_total(samqc, plotdir = outputdir)
qcout_sampleqc_total(samqc)
  • Accepted Reads
qcplot_stats_accepted(samqc, plotdir = outputdir)
qcout_sampleqc_library(samqc)
qcout_sampleqc_cov(samqc)


  • Genomic Coverage
qcplot_position(samqc, "screen", plotdir = outputdir)

qcout_sampleqc_pos_cov(samqc)


  • Genomic Position Percentage
qcplot_position_anno(samqc, c("hgsm3_d4_r1", "hgsm3_d4_r2", "hgsm3_d4_r3"), type = "lof", plotdir = outputdir)

qcout_sampleqc_pos_per(samqc)


Run Experiment QC

coldata is necessary for DESeq2. Example like below:

replicate condition
hgsm3_d4_r1 R1 D4
hgsm3_d7_r1 R1 D7
hgsm3_d15_r1 R1 D15
hgsm3_d4_r2 R2 D4
hgsm3_d7_r2 R2 D7
hgsm3_d15_r2 R2 D15
hgsm3_d4_r3 R3 D4
hgsm3_d7_r3 R3 D7
hgsm3_d15_r3 R3 D15

run_sample_qc_deseq2 runs DESeq2 analysis by conditions in the coldata.

expqc <- create_experimentqc_object(samqc, coldata, "D4")
expqc <- run_experiment_qc(expqc)
## Running control deseq2 to get size factor...
## Running deseq2 on all the filtered samples...


Create Plots & Tables

The output directory is required for plotting using the QC object.


  • Sample Correlations
qcplot_dist_samples(expqc, plotdir = outputdir)
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.


  • Sample PCA
qcplot_pca_samples(expqc, ntop = 500, plotdir = outputdir)


  • Fold Changes
qcplot_deseq_fc(expqc, plotdir = outputdir)